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Abstract 

In this document, we show how the different quantities necessary to compute kernel quantum prob- 
abilities can be computed. This document form the basis of the implementation of the Kernel Quantum 
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*~j . Probability (KQP) open source project 1 . 

o 

^ ■ 1 Introduction 

rs\ • Quantum Probabilities correspond to one of the generalisation of standard probabilities. It is founded on 

the mathematical theory underlying Quantum Physics. This framework was developed in the 1930s by 
von Neumann and Dirac. It was recently further developed and generalised by the so-called "sequential 
effect algebra" [3]. The Kernel Quantum Probability library (KQP) aims to provide tools to effectively 
compute "quantum probabilities", that is to compute a representation of densities, events and to update the 
densities when events are observed (conditionalisation). It also provides access to generalisation of standard 

(~N . probabilistic measure like entropy and divergence [■'>]. 

j^T ' Computing quantum probabilities related quantities relies on linear algebra, and more precisely on the 

^-\ . definition of an inner product in a Hilbert space, since this defines the probability of transition (when 

f^*) ' measuring) between possible system states. 

VO . In the machine learning community, a standard "trick" is to use a kernel to define the inner product [?] . 

^fs ' That is, states can be represented in an arbitrary feature space T for which there exists a mapping $ such 

that $(.t) • $ (y) is valid inner product. We call k(x,y) = $(x) • $(y) the kernel, which can be computed 

Cn ' without explicitly computing &(x), thus allowing to work in high or infinite spaces. 

This documents describe how to compute quantum probabilities related quantities relying only on the 
inner product definition given by the kernel. The organisation of this document is as follows: 



1. In Section 2, we describe how to compute probabilities and how to update the probabilities given a 
subspace (or its orthogonal), both from a theoretical point of view and implementation point of view. 

2. In Section 3.2, we describe how to compute an approximation of quantum densities or events. 

3. In Section 4, we give an example of code using KQP in C++. 

Notations 

We suppose that we work within a complex Hilbert space, that is that the field is C unless otherwise specified. 
The set of complex matrices of dimensions n by p is denoted A4 nxp . 

In order to deal with kernels, following the literature, data points in the original space will be called 
pre-images since they are used to build a basis of the subspace containing the quantum density or event (see 
Section 2). 
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In order to use common linear algebra notations, we consider a list of pre-images as a linear map, and 
use uppercase calligraphic letters to denote a list of pre-images: A list X of n pre-images is denoted 



X g M% } 



,(«) 



An arbitrary list belongs to M. n = \J n M.\i ■ A linear combination of pre-images is simply denoted XA 
where A e M n x P - 

We define the adjoint operator in a natural way, i.e. it maps a list of pre-images into 

X^ €U p c(M^;M n xp 

We denote k the kernel, i.e. we denote 

k(X,U) = X^U eM nxp 

with X e M^ and U 6 M$ . 

Finally, we use the symbol V to denote the composition of a linear operator with its transpose, i.e. 

V (A) = AA^ 

2 Computing probabilities 

Readers are referred to [3, 4] for a discussion and presentation of what are quantum probabilities. Shortly, 
they can be defined by: 

A quantum density is a positive semi-definite self-adjoint linear operator p of trace 1; 

An Observable is a projector and corresponds to a yes-no measurement, i.e. to a quantum "event"; 

An effect is an operator A such that < (Ax,x) < 1. Note that an observable is an effect, but the reverse 
is not true. Effects can be considered as "fuzzy" or "imprecise" observables. 

In this section, we present the formulas corresponding to the various quantities of interest (probability, 
conditionalisation, divergence) and how we can compute them within KQP. This section is based on the 
work of Gudder [3] (effects) and [6] (divergence and entropy). 

In this section, we suppose we have a density p and an effect E € 8 (7i) that can be decomposed as: 

p = V {XpYpZp) and E = V (X E Y E Z E ) 

where X, belong to A4 n ,Y, and Z, to M.. 

For some operations, we need the decomposition to be in an orthonormal form, i.e. that 

Y^X ] XY = Id (1) 

When using the orthonormality hypothesis, we use the symbol o[. . .] over the equality. For example, 

Y}XIX P Y P °k ] Id 

Finally, for densities we use the proportionality to denote that it should be normalised, i.e. p oc p u means 
that 

Pu 

tr (p u ) 
Note that it is straightforward to compute the normalisation factor, since, using the cyclic re-ordering 
property of trace operators: 

tr (p) = tr (XpYpTffiX}) = tr (T*Y}x}X p Y p ) 

If the decomposition is orthonormal (Eq. 1), we have tr (p) = tr (£?) = ||S P || . 

For a matrix A, we denote A,j its j th column, A^. its i row. If a matrix A as an subscript p, we use a 
semicolon to separate the subscript from the column/row indices, as for example in A p; ij. 



2.1 Computing probabilities 

The probability of an effect E is defined as 



Pr p (E)=t r (pE) 



We can compute the probability of an effect E using the re-ordering property of the trace operator 

Pr p (E) = tr (pE) = \\z E Y E k (X E , X p ) Y p £ 

2.2 Entropy 

The entropy of a density p (with an orthonormal decomposition) can be written [i i] : 

tr(plog(p))) °L P] tr ((Y}X}X P Y P ) S p (Y}x}X p Y p ) log (X$) 
= tr(£>g(^)) 
= 532E»„log(E Pi «) 



2.3 Divergence 

Umegaki [(>] proved that the equivalent of the Kullback-Leilbler divergence between two densities p and r 
can be computed as: 

J(p\\r) = tr(plog(p) - plog(r))) 

The first part corresponds to the entropy, and the second part can be computed as follows. In order to 
deal with infinities, in practice we want to compute the divergence using r' = (1 — e) r + eald where aid is 
a blank noise, i.e. tr (aid) = 1. In case of infinities, a can be set to a small value. We have: 

tr(plogr') =tr(plog((l -e)r + eald)) 

°^ ] tr (p [X T Y T log ((1 - e) Y? T + eald) Yj X} + log (ea) (id - X T Y T Y^X\)\ ) 

= tr (£ p Y}XlX T Y T log ((1 - e) T, 2 T + eald) Y}X}X p Y p E p ) + log (ea) (tr (p) - tr ( P X T Y T Y^X\)) 

E p Y}x}XrY T (log ((1 - e) £* + ea/d)) 1/2 | 2 + log (ea) (l - \\H ^ X} X T Y T 

2.4 Conditionalisation 

We first give the formulas to compute the conditional quantum density when observing an effect E, and then 
when observing its orthogonal E 1 - 

2.4.1 Projecting on the effect E 

If we observe the event E, the density p conditioned upon E, denoted p > E, is given by: 

E l / 2 pE 1 ' 2 



pt>E = 



tr (pE) 



We can focus on the numerator since we only have to normalise the resulting density afterwards. We 
have 

p > E ex E 1/2 pE 1/2 

We can distinguish two cases: 



1. E is an observable: since E = E 1 / 2 , we have 



p \>e = v 



X E (Y E T;%Y*k (x E ,x p ) r p ) n p 



2. E is a "strict" effect: In this case, we require an orthonormal decomposition for E, and we can compute 
the projection as: 

p>e o[ = ] p \x E (Y E n E Y E k (x E , x P ) y p ) s p 

In both cases, the resulting density is not in an orthonormal form. 

2.4.2 Projecting on the orthogonal E 1 - 

If we observe the orthogonal of event E, we can update our knowledge on p, denoted p > E , as: 

, (Id-E) 1/2 p(Id-E) 1/2 

P > hi = ; 

P 1 - tr ( P E) 

When E is in an orthonormal form, we can use the fact that Id — Y e XeX e Y e is the projector on the 
space orthogonal to the space spanned by the vectors of E. Thus, 



(Id - E) 



1/2 



X E Y E (ld-Y?Y l 'Ylx\ 



»" 1/2 ^ E 



Id - XkYkYXxI 



E I E I E A. E 



= Id-X E Y E 



Id-(ld-E 2 E ) 1/2 



Y E X E 



Using the above, we can write: 



p>E ± (xP ({Id - E) 1/2 X p Y p Y,p 
<xP 



X p X E 



-Y E 



Id - (Id -11 



Id 

2\!/2 

e) 



Y E k(X E ,X p ) } Y ^ P 



(2) 



We readily verify that when He = Id it gives the right formula p~ XYY^X^p. We can use in those cases 
a direct EVD approach (section 3.1) to obtain a simplified form. 



3 Approximating operators 

In this section, we describe the techniques used to computed low-rank approximations of linear operators in 
the feature space. In particular, we are interested in methods where the operator can be decomposed as: 



^2aMAiA\uj w XYHY^X^ 



(3) 



were XY is (or might be) orthonormal, i.e. Y^X^X^Y is the identity. 
In the following, we describe: 

• In Section 3.1, how to get an EVD decomposition of any linear operator of the form it = XAA'X'. 
This is useful in order to e.g. lower the rank and is needed or before removing feature space vectors 
from X. 

• In Section 3.2, we show how to update the EVD of a linear operator il with a low rank operator 
aMiAiA\u]. 



• In Section 3.3.2, we show how to remove feature vectors from X when we have an EVD it = XA'SA^X^. 
We use two techniques: 

— Null space method (Section 3.3.1) 

— Quadratic optimisation to find the subset of pre-images that minimise the reconstruction error 
(Section 3.3.2). 

3.1 Direct EVD 

In this section, we discuss how to get the orthonormal form of an operator written as il = XASA^X^ where 
S is a diagonal matrix. We first describe the case where S is positive semi-definite, before tackling the 
general case. 

This type of approach is useful in several cases, and the builder AccumulatorKernelEVD in KQP relies 
on this decomposition, since it represents Eq. (3) as 

( u x u n )\ ■.. ■.. ■.. ( u x u n y 

A n J \ 2-i n ) \ A n 

where S^ = diag (pn, . . . , a,). 

3.1.1 Semi-positive definite case 

Suppose we have il = XAA^X^ and we wish to transform it to an orthonormal form. To achieve this, we 
have to compute a thin EVD 

EDE^ = A^X^XA 

It is then straightforward to obtain the desired form by posing Y = AED^ 1 / 2 and £ = D 1 / 2 D 1 / 2 

XYYY^X^ = XAEE^A^X^ = XASA^X^ 

where the last equality can be shown has follows. Any vector y 6 "H can be written XAP + VQ where 
V^XA = 0. Then, 

XAEE ] A^X^y = XAEE^ A^X ] XA P + XAEE^A^X^VQ 

EDEf 



A^X^XA 

= XAA^X^XAP + XAA ] X ] VQ 



= XAA^X^y 
We also can show easily that XY is an orthonormal matrix 

Y^X^XY = D- 1/2 E^ (A ] X^XA) ED- 1 ' 2 = Id 
It is then possible to remove some pre-images using techniques from Section 3.3.2. 

3.1.2 General case 

In the general case, we have il = XASA^X^ which can be rewritten il = XAS 1 I 2 S 1 I 2 A^X^. That is, unless 
we use a real field and S is not semidefinite positive. In that case, we can still write 

ii = xbb^x^ - 2;tcc* t ;t t 



where 

B = A{S+ + S-) 1/2 

C = AS 1 ! 2 

where S± is the S matrix where negative (resp. positive) values are set to 0. We then use the approach above 
to compute an orthonormal decomposition of XBB^X' , and then, using the fact that the space defined by 
XYYX^ contains XC, 

il = XYYY^X^ - 2XCC^X^ 
= XY [£ - 2ZZ^\ rt#t 

with Z = Y^X^XC. We then have to compute another EVD for £ — 2ZZ', which will give the final form of 
il. 

3.2 Low-rank update of operators 

The problem is to compute a low rank approximation of 

il = J2 aMAiAluj 

i 

where Ik G M H . 

In the following, we consider just one update and we drop the i for more clarity. We further assume that 
we have a current approximation decomposition expressed as 

il= XYZHZ^X 

where 

(n) 

• X £ -MXj an d Y is a n x r matrix such that XY is orthonormal; 

• Z is a r x r unitary matrix. This matrix is used in order to avoid updating the potentially larger 
matrix Y when the list of pre-images remain the same; 

• £ is a diagonal matrix of rank r 

In order to be able to process incrementally the set of vectors U, we wish to compute at each step a rank 
one update of U 

it = il + aUAA^U^ ps it = X'Y'Z'H ( X'Y' Z') ] 

This problem is related to [2] that deals with incremental Kernel SVD, and we follow mainly the same 
approach. We use the following constraints: 

1. Keep the (relative) error e = it — it / ||il|| below a limit r\ (if possible, see below); 

2. Keep the rank r below the limit r max ; 

3. Keep the number of pre-images below a number cr where c > 1. 

3.2.1 Pre-computations 

We can write U as the direct sum 

UA = (Id - XYY^X^) UA +XY Y^X^UA (4) 

' v ' w 

The operator W can be computed explicitly as: 

W = Y^X^UA = Y f k (X, U) A (5) 



General case We can compute V^V as 

V^V = A^k(U,U)A-W^W 
which can in turn be used to compute 2 the (full) EVD of W^. 



and 



Special case U = X When UA is a linear combination of kernel vectors. In this case, we have V = 

W = Y^X^XA = Y^k {X, X) A 



Updating the operator Let us express UA as the direct sum of its projection onto the subspace spanned 
by XYY^X^ and its orthogonal. Since by definition W = (XY) UA, we can write UA as: 



UAAW ^V(XYY^X^UA + V) =V ( ( X V 

where Q, Qo and D such that ( Q Qo ) is unitary and 

VQDQ^V^ = VV f 
We can write 



Y 
Q 



WQD 1 / 2 WQ 
D 1 ' 2 



ii = V( XYZT, 1 / 2 



V 



X V 



Y 
Q 



Z 




and hence: 



EV2 




(6) 



iX + aUAA^rf = ( X V 



Y 

Q 



Z 




WQD 1 ' 2 WQ C 
D 1 / 2 



S \ f Z 
) \ 

WQD 1 / 2 WQ 
D 1 ' 2 



Y 
Q 



X V 



(7) 



Computing Q Since ( X V 



X V 



Y 
Q 



Y 
Q 

t 



should be an orthonormal matrix, we should have: 



X V 



Y 
Q 



(XY)^ 
Qtyt 



XY VQ 



Id 

Q^VQ 



where we used the fact that (XY) V = from Eq. (4). Hence, orthonormality is equivalent to 

Q^V^VQ = Id 



(8) 



Using the result of Section 3.1, we can compute the thin EVD CDC^ of k (V, V) and pose Q = CD~ X I 2 
which will verify both Eqs. (6) and (8) . Qo corresponds to the basis of the null space obtained using the 
same decomposition (note that if V = 0, Qo is the identity). 



2 Note that we could use a Cholesky decomposition k (U,U) = LL^ followed by a generalised SVD on L* A and W to find 
the EVD of W*. 



Updating We can use standard rank-one update techniques to update the decomposition; since Z is 
unitary, we can write 

Z 0\/S 0\fZ 0\ f ( WQD 1 ' 2 WQ \fWQD 1 / 2 WQ 
Id J \ J \ Id J +a { D 1 / 2 J \ D 1 ' 2 

Z 0\//S 0\ / Z^WQD 1 /' 2 WQo \ ( Z^WQD 1 / 2 WQq \ f \ / Z 

Id ) [\ oJ +a ^ D 1 / 2 ) { D 1 ' 2 ))\0 Id 

which is a rank p update of a diagonal matrix. Note that, as we cannot really compute V, we have to get 
back to an expression where IA appears in the first matrix: 

X V)| \l) = [X UA-XYW)[\ I 

X u ) ' Y ~ YW ^ 



AQ 



3.3 Reducing the pre-image set 



In this section, we describe the techniques used to reduce the number of pre-images. When a linear combin- 
ation of pre-images is possible, it is better to use the direct EVD approach described in Section 3.1. 

3.3.1 Null space method 

Suppose we have an operator il defined as 

ii = XYTY ] X^ 
and we wish to reduce the set of pre-images in X without loss. We suppose S is full rank. 

1. Remove the pre-images for which a line of YY,Y^ is null (and remove the corresponding column of Y). 

2. If the decomposition is not orthonormal, use a QR or LU decomposition to find the null space of X^ X, 
i.e. a full rank Z such that X^XZ = 0. We then remove n pre-images (see below) where n is the rank 
of Z. 

3. Finally, we remove non used pre-images like in step (1). 

Null space and pre-images We want to find X' and A such that X = ( X' X'A ) P where P is a 
permutation matrix. 

We have a basis Z for the null subspace of X^X. If z is in the null subspace, then 

ViX}'£z j X j = =► $>A- = 

3 

since ^ ,- z j^j belongs to the span of X. 

To chose among the pre-images, we chose to remove first those that are the less used, i.e. those for which 
\\Yj,\\ \\Xj\\ is minimum. We also have to ensure that Zj is not too small, i.e. is above ^11^11^. We then 
remove entries one by one using the pivoted Gauss algorithm. 



3.3.2 Quadratic Programming approach (Ll-optimisation) 

Another to remove some pre-images is to try to directly optimise the cost using an Li regulariser to set some 
rows of Y close to 0. Denoting A = YZ, we seek at minimising the difference 



E = WXAZA^X^ - XBTB^X^l 



a £ ll^-l 



Using L\ regularisation ensures that B is sparser than A - in particular, rows of B are close to (which 
means that the corresponding pre-images Xi can be removed). 

In the following, we suppose that A is of dimension r x n (i.e. r basis vectors and n feature vectors). 
Using the link between the trace and the Frobenius norm, we have 

E = tr(P (XASA^X^ - XBTB^X^)) 

Denoting a, = XA,iT}/ 2 and bi = XB,{T X ^ 2 the two sets of vectors (in the feature space), we can then 
rewrite E as 



E =tr Y^ ^ajaja] + bib\bjb] - 2aia\bjb 



,.h\ 



i -j 



=E( a ^) + (*&•) 2 



a l b 3 



The problem we want to solve is linked to the "reduced set" approach proposed in [5], where one seeks 
to minimise the following cost function (with L\ regularisation): 



minimise Ers = /^ v j ||% — bj 



\& 



(9) 



subject to \/i ^,; > max \Bij 
j 

The role of £i is to regularise the importances of feature vectors; we need to set A appropriately so that some 
rows of B are close to at the end of the optimisation. Finally, and differently from other approaches, we 
added a new constant, Vi, that ensures that K x Ers > E for some K > 0. We discuss both in the following. 



Relation with the reduced set approach We first check that minimising Ers solves our problem. The 
main difference is that E contains terms of the form b\bj and ajbj. However, they will tend to be will be 
close to since the feature vectors will be approximately orthogonal. 

Posing bi = Hi (fli + Cj) with a^J-Ci, we can first show that \ii must be equal to ||aj|| I ||<Zj|| + |[cj|| 

when Ers is minimised. Then, we can show that 



6 Ik; 



hf > (a\a x y + (bib,)' 



a]b, 



Now we have to prove that all cross terms (i ^ j) are minimised if we minimise the new objective function, 
which intuitively is ensured by the fact that Qi-Ldj. Denoting A.; = a, — bi, we have for i =/= j: 



(«!%) 2 + ($b j ) ' 



a i h 3 



4 a j + A I% 



A|Aj 



«]A, 



which is clearly bounded by K m&Xi ||Aj|| and hence by K'Ers- 

Our problem is thus to optimise Eq. (9) with Vj = Uj = \\a,j\\ 
B' % : = (TjBj, we can reformulate the optimisation problem as: 



= IS 



33 \' 



or equivalently, by posing 
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minimise Ens = / ^j^j — bj\\ + ^j (10) 

3 

subject to Mit^i > max cr^ 1 l-SL-l 

Setting A If B = A (trivial solution when A = 0), then to minimise the above equation, we set & = 
maxcr" \<jjAij(jj | and 

<i = A^max|aj /2 ^| 

i 

If we remove the i pre-image the error becomes 



4s = 4°s-Amax 



-r^l+ER /2 (<^-<< 



where A^ is A with the i th row set to zero. Hence, in order to remove the i th pre-image, we need to set A 
such that 






rr 1/2 /l-- 



>0 



where K = k (X , A"). If we want to remove (at least) to pre-images whose indices are in M, we want to have 

A> 



E, eM l^«| 2 E;= 1 ^ 3 4 J -^.j|Ai i | a 



SieM max j \ a iAij\ 
As an heuristic, we set M to be the set of indices of pre-images with , minimum E RS — E RS . 

Quadratic optimisation The quadratic programming problem can be solved using quadratic cone op- 
timisation. This is detailed in Appendix A. 

Re-estimation of parameters We project the old operator into the new space in order to minimise the 
error, i.e. 



u= (yBB^y^xA) 'EA^x^Byy^B^ 



4 Example 



# include <kqp / feature_ matrix/ dense. hpp> 

# include <kqp / kernel _evd/ incremental . hpp> 
# include <kqp / probabilities. hpp> 

int main(int, const char**) { 

// Compute a density at random 

// Definitions 
using namespace kqp; 
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typedef Eigen: :Matrix<double, Eigen: :Dynamic, Eigen: :Dynamic> Matrix; 
int dim = 10; 

// Creating an incremental builder 
IncrementalKernelEVD<DenseMatrix<double>> kevd ; 

// Add 10 vectors with on = 1 
for (int i = 0; i < 10; i++) { 

// Adds a random </>, 

Matrix m = Matrix: :Random(dim, 1); 

kevd. add(DenseMatrix<double>(m)) ; 
} 

// Get the result p « X Y D F f X^ 

DenseMatrix<double> mX; 

typename AltDense<double> : : type mY; 

Eigen: :Matrix<double , Eigen: : Dynamic, 1> mD; 

kevd.get_decomposition(mX, mY, mD) ; 

// Compute a kEVD for a subspace 

IncrementalKernelEVD<DenseMatrix<double>> kevd_event ; 
for (int i = 0; i < 3; i++) { 

// Adds a random <fi 

Matrix m = Matrix: :Random(dim, 1); 

kevd_event . add (DenseMatrix<double> (m) ) ; 
} 



// Compute some probabilities 

// Setup densities and events 
Density<DenseMatrix<double>> rho(kevd) ; 
Event<DenseMatrix<double>> event (kevd_event) ; 

// Compute the probability 

std::cout << "Probability = " << rho .probability (event) << std::endl; 

// Conditional probability 

Density<DenseMatrix<double>> rho_cond = event .project (kevd) .normalize () ; 

// Conditional probability (orthogonal event) 

Density<DenseMatrix<double>> rho_cond_orth = event .project (kevd, true) .normalize () ; 

return 0; 
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5 Conclusion 

This document described the Kernel Quantum Probability Library, that can be used to compute quantum 
events and density in the an arbitrary feature space and relies only on the definition of a kernel, i.e. of the 
inner product between any two feature vectors. 
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A QP Approach 

In this section, we derive a computationally efficient way to optimise Eq. (9). We show here how to transform 
this optimisation problem into a cone quadratic programming approach proposed in [1]. We handle both the 
complex and the real field cases. 

A.l Precomputations 

Writing a, = o~{A t {, Pi = TjB t i and K = X* X the gram matrix [r is the rank of the operator, n is the 
number of pre- images) , we have 

r 

e rs = y, p\p* - 2 ^ KAr) + A ^ 

9=1 
r 

= Y J PlKP q ~m(a\Kp q )+\i q 

9=1 

We get back to a real case by posing j3 q = /? ' + i/3 1 ' and a q = a' + ia q . Dropping q for clarity, we have 

ftK (3 = /3* A/3' + p'^Kp" + i {fi lJ{ K(i" - /3' n K(3') 

= 0*81 (K) p' + p"1& (K) (3" + i (p'^K/3" - p*tfp") 
= /3*SR (K) p + /3"*M(K) P" - 2/3*3 (A') /3" 

12 



and 



5R (cJK0) = 5R (c/t A/3' + a'^Kfi" + ia'^Kp" - ia' n Kp') 

= (c/tsR (A) + o"t9f (X)) /3' + (a^K (A') - a*9i (A)) 0' 1 



Hence 



^A-2»(at^)+^= / ^- V ' ^ W S(X) ^ ' * 



i ^ y v s(A) K(A) y v % 



n 
a" 



If we let 



( p\ ■■■ # 6 



e, 



$ 



with /3j = I „*, I in the complex case and Pi = (3[ in the real one. 

We require that both the real and imaginary part be inferior to £, i.e. that 



5R(A) 9(A) 
-3(A) SR(A) 



# 

^ 



Vi6l...n ) Vgel...r,i/,(±/3' gi ±^)+f j >0 

where z/ q are weights associated to basis vectors in the feature space, and x has a length n x (r' + 1). Our 
problem can be expressed as a cone quadratic problem 



minimise x^ Hx + 2c* x 
subject to Gx < 



Denoting Id n the matrix ( Id n ■ ■ ■ Id n )t and 1^ the matrix (1 • • • 1 )t, we can identify: 



# 



A' v 



/ -K" ai \ 

-K"0t r 



G = 



-S -Id 
S -Id 



n 
(r') 



where diag r repeats the matrix r times in the diagonal where 

S = diag (viG , ■ ■ ■ VlG ) 
with Go defined latter. 

Case K = R In the case where K = R, we have K' = K" = K and r' = r and Gq = Idn 
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Case IK = C we have r' = 2r and 



K' = 

K" = 
Go = 



5R (A) -3 (A) 
-3? (A) R(K) 

*ft (A) % (K) ' 

-3 (A') K (A) 

Id n Id„ 

Id n -Idn 



A. 2 Pre-solving the system 

In order to speed up, we need to solve the linear systems defined by 

H G^ 
G V 

where V is a diagonal negative matrix. With a bit of re-ordering, this gives 



/ diag r A'' -Idnr 
-S -U 

S 

v or }t 



02nr 
(r')t 



-Id 



id\:' ] 

\j nr i 
-V 



,(r') 



- JVM , 

V v I 



-id { :' ] 





/ 



d 

w 



where [/ and V are positive semi-definite (diagonal) matrices of size 2nr. 

We want to perform a LDL^ decomposition of this matrix (a D-Cholesky) . Given the structure of the 
above matrix, we decompose these matrices as 



£ 



£21 £22 
£31 £32 £33 
\ £41 L42 £43 £44 / 



( Jh 



and D = 



Do 



D, 



Di J 



Solving Ln A Cholesky decomposition of A', AA^ = A' gives 

Ln = diag (A • ■ • A ) and £>i = Id 
Note that in the complex field case, we can decompose the problem into 

A U A\ 1 = ?H(K) 

A 21 A\ 1 = -Z(K) 
A 22 Al 2 = ^(K)-A 21 Al 1 



Solving L21 and £31 We now have 



£21 



£1 



-S 



£31 , 
where S = diag (^iGo, . . . viGq). This can be solved straightforwardly by first solving 3 BA^ = Go . 

£21 — diag ( —v\B ■■■ —v r B ) and £31 = diag ( v\B ■■■ v r B ) 



5 Note that BB^ is positive definite since BB* = A ^GqA 1 where Go is positive definite 
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Solving L-22 We have to solve L 22 B\ 2 L 22 = -U - L?,\L\ X = -diag ((U t + vfBB^).). Since Ui + vfBB^ is 
positive definite, it is sufficient to solve the r Cholesky decompositions L 22 L 22 = Ui + vfBB^ with 



diag ( lJ$ ■ ■ ■ L { 22 ] ) and D 2 



Solving L32 Then, we find L32 by solving L 32 D 2 L\ 2 = —L 3 \L* 21 which can be solved by solving the r 
systems L$L$ = -vfBB\ 



L 32 = diag ( 1$ ■ ■ ■ £32 



Solving L 33 For i 33 , we have 



L 33 D 3 L 33 — L 32 L 32 + L 3 \L 31 — —V 
which can be solved by r D-Cholesky decomposition (all the matrices are block diagonal): 

L 3 M ] L 3 ? = -Vt - tBtf + L^L 3 f (11) 

In order to simplify the computation, let us prove that the right hand side is negative definite; we have 

T 00 T 00 1 _ ,,2 R R f T (0-1 T (0-t R Rf 

-^32 ^32 — ~ V i ati h 22 L 22 Ut > 

= -vfBB^ (Ui + BB ] ) _1 BB^ 

= vfBB^ -Ui + Ui (Ui + BB^y 1 Ui 

We have 

Uf 1 (Ui + isfBB^) Ur 1 = ur 1 + ufU^BB^Ur 1 >U^ 1 > Q 
using the partial order of definite matrices and its properties. This implies that 

Ui >U t (U t + v?BB^y 1 U t >0 

and hence —Ui + Ui (Ui + vfBB^ Ui is positive semi-definite, which in turn implies that the right hand 
side of Eq. (11) is negative definite. This shows that we can find L33 using a Cholesky decomposition 

L 33 L 33 =Vi + v i BB 1 - L 32 L 32 
and that D 3 = Id rn . 

Solving L41, £42 and L43 For the fourth row, we first have trivially L41 = 0. 

To solve L i2 D 2 L* 22 = ( — Id n ■ ■ ■ —Id n ) , we have to solve L^ 2 L 22 = Id n for i = 1 . . . r, and then 

L42=(4 1 2 ) ••• L$ 

To find L43, we have to solve L43L33 = Id n — £42 £32 for i = 1 . . . r and 

^ = (z« ■•• 4?) 

Finally, the last equation Li 2 D 2 L\ 2 -\- Li 3 D 3 L\ 3 -\- L^D^L*^ = can be solved by computing the Cholesky 
decomposition 
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L AA L\ A = ^L$L$ + L&L® 1 with D A = Id n 



A. 3 Solving the linear system 

Here is the final structure of the decomposition: 



L 



D 



-Idl 



\ 



—L21 L22 

L21 L32 L33 

\ L42 £43 L44 J 

1 Q^ri.r 



Id-r, 



with 



L 2 i = diag ( —v-lB 



~B ) 



which gives the following systems to solve: 



Ax' 



(>i 



-(i) 

j 22 



_L/QQ i, 



z[ = -ViBx'i - c t 



- v i Bx i — L Z2 z l - di 



Lay' 



y ;L 43 t 



(i)*i 



T {t) z' 



and finally 



L \iV = y' 




rWty.. _jj 
■^33 ^ — l i 


^43 y 


L^yi = y'i 


rWt, 
-^32 l 



^42 y 

A^Xi = x\ — ViB'ti + ViB* Zi 
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